Learning Naive Bayes Classifier from Noisy Data
نویسندگان
چکیده
Classification is one of the major tasks in knowledge discovery and data mining. Naive Bayes classifier, in spite of its simplicity, has proven surprisingly effective in many practical applications. In real datasets, noise is inevitable, because of the imprecision of measurement or privacy preserving mechanisms. In this paper, we develop a new approach, LinEar-Equation-based noise-aWare bAYes classifier (LEEWAY ), for learning the underlying naive Bayes classifier from noisy observations. Using linear system of equations and optimization methods, LEEWAY reconstructs the underlying probability distributions of the noise-free dataset based on the given noisy observations. By incorporating the noise model into the learning process, we improve the classification accuracy. Furthermore, as an estimate of the underlying naive Bayes classifier for the noise-free dataset, the reconstructed model can be easily combined with new observations that are corrupted at different noise levels to obtain a good predictive accuracy. Several experiments are presented to evaluate the performance of LEEWAY. The experimental results show that LEEWAY is an effective technique to handle noisy data and it provides higher classification accuracy than other traditional approaches. keywords: naive Bayes classifier, noisy data, classification, Bayesian network.
منابع مشابه
Sisterhood of Classifiers: A Comparative Study of Naive Bayes and Noisy-or Networks
Classification is a task central to many machine learning problems. In this paper we examine two Bayesian network classifiers, the naive Bayes and the noisy-or models. They are of particular interest because of their simple structures. We compare them on two dimensions: expressive power and ability to learn. As it turns out, naive Bayes, noisy-or, and logistic regression classifiers all have eq...
متن کاملAn Improved Naive Bayes Classifier-based Noise Detection Technique for Classifying User Phone Call Behavior
The presence of noisy instances in mobile phone data is a fundamental issue for classifying user phone call behavior (i.e., accept, reject, missed and outgoing), with many potential negative consequences. The classification accuracy may decrease and the complexity of the classifiers may increase due to the number of redundant training samples. To detect such noisy instances from a training data...
متن کاملIncremental Weighted Naive Bays Classifiers for Data Stream
A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes’ theorem with naive independence assumption. The explanatory variables (Xi) are assumed to be independent from the target variable (Y ). Despite this strong assumption this classifier has proved to be very effective on many real applications and is often used on data stream for supervised classification. The n...
متن کاملPredicting carcinoid heart disease with the noisy-threshold classifier
OBJECTIVE To predict the development of carcinoid heart disease (CHD), which is a life-threatening complication of certain neuroendocrine tumors. To this end, a novel type of Bayesian classifier, known as the noisy-threshold classifier, is applied. MATERIALS AND METHODS Fifty-four cases of patients that suffered from a low-grade midgut carcinoid tumor, of which 22 patients developed CHD, were...
متن کاملA New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003